Sequence recognition for automatic vector removal from contigs. Preparing sequence recognition for automatic vector removal

Sequence recognition for automatic vector removal

with DNA Sequence Assembler

Content

1. General info about the pGEM-T eazy vector
2. Designing recognition sequences for the pGEM-T eazy vector
3. Using recognition sequences with DNA Sequence Assembler (tutorial illustrating that the recognition sequences we designed at step 2 are functioning)

1. General info about the pGEM-T eazy vector

Below we are showing the sequence of the pGEM(R)-T easy vector. The vector is delivered in a linear form, to facilitate the ligation of the insert. The pGEM(R)-T Easy Vector has been linearized with EcoRV at Base 60 of this sequence (indicated by an asterisk *) and a T added to both 3'-ends (the added T is not included in the sequence of the vector given below). Therefore, the insert will be ligated at position 60, where the asterisk * is found.

> pGEM-T Easy Vector (Promega corporation)

1 GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG
51 GGAATTCGAT* ATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT
101 GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC
151 ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT
201 TGTTATCCGC TCACAATTCC ACACAACATA CGAGCCGGAA GCATAAAGTG
251 TAAAGCCTGG GGTGCCTAAT GAGTGAGCTA ACTCACATTA ATTGCGTTGC
301 GCTCACTGCC CGCTTTCCAG TCGGGAAACC TGTCGTGCCA GCTGCATTAA
351 TGAATCGGCC AACGCGCGGG GAGAGGCGGT TTGCGTATTG GGCGCTCTTC
401 CGCTTCCTCG CTCACTGACT CGCTGCGCTC GGTCGTTCGG CTGCGGCGAG
451 CGGTATCAGC TCACTCAAAG GCGGTAATAC GGTTATCCAC AGAATCAGGG
501 GATAACGCAG GAAAGAACAT GTGAGCAAAA GGCCAGCAAA AGGCCAGGAA
551 CCGTAAAAAG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG
601 ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA
651 GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC
701 TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT
751 CGGGAAGCGT GGCGCTTTCT CATAGCTCAC GCTGTAGGTA TCTCAGTTCG
801 GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA
851 GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG
901 TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC
951 AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA
1001 CTACGGCTAC ACTAGAAGAA CAGTATTTGG TATCTGCGCT CTGCTGAAGC
1051 CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC
1101 ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG
1151 AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG
1201 CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA
1251 AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC
1301 AATCTAAAGT ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA
1351 TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT
1401 GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC
1451 TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG
1501 ATTTATCAGC AATAAACCAG CCAGCCGGAA GGGCCGAGCG CAGAAGTGGT
1551 CCTGCAACTT TATCCGCCTC CATCCAGTCT ATTAATTGTT GCCGGGAAGC
1601 TAGAGTAAGT AGTTCGCCAG TTAATAGTTT GCGCAACGTT GTTGCCATTG
1651 CTACAGGCAT CGTGGTGTCA CGCTCGTCGT TTGGTATGGC TTCATTCAGC
1701 TCCGGTTCCC AACGATCAAG GCGAGTTACA TGATCCCCCA TGTTGTGCAA
1751 AAAAGCGGTT AGCTCCTTCG GTCCTCCGAT CGTTGTCAGA AGTAAGTTGG
1801 CCGCAGTGTT ATCACTCATG GTTATGGCAG CACTGCATAA TTCTCTTACT
1851 GTCATGCCAT CCGTAAGATG CTTTTCTGTG ACTGGTGAGT ACTCAACCAA
1901 GTCATTCTGA GAATAGTGTA TGCGGCGACC GAGTTGCTCT TGCCCGGCGT
1951 CAATACGGGA TAATACCGCG CCACATAGCA GAACTTTAAA AGTGCTCATC
2001 ATTGGAAAAC GTTCTTCGGG GCGAAAACTC TCAAGGATCT TACCGCTGTT
2051 GAGATCCAGT TCGATGTAAC CCACTCGTGC ACCCAACTGA TCTTCAGCAT
2101 CTTTTACTTT CACCAGCGTT TCTGGGTGAG CAAAAACAGG AAGGCAAAAT
2151 GCCGCAAAAA AGGGAATAAG GGCGACACGG AAATGTTGAA TACTCATACT
2201 CTTCCTTTTT CAATATTATT GAAGCATTTA TCAGGGTTAT TGTCTCATGA
2251 GCGGATACAT ATTTGAATGT ATTTAGAAAA ATAAACAAAT AGGGGTTCCG
2301 CGCACATTTC CCCGAAAAGT GCCACCTGAT GCGGTGTGAA ATACCGCACA
2351 GATGCGTAAG GAGAAAATAC CGCATCAGGA AATTGTAAGC GTTAATATTT
2401 TGTTAAAATT CGCGTTAAAT TTTTGTTAAA TCAGCTCATT TTTTAACCAA
2451 TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT AGACCGAGAT
2501 AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG
2551 TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA
2601 CTACGTGAAC CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA
2651 AGCACTAAAT CGGAACCCTA AAGGGAGCCC CCGATTTAGA GCTTGACGGG
2701 GAAAGCCGGC GAACGTGGCG AGAAAGGAAG GGAAGAAAGC GAAAGGAGCG
2751 GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG TAACCACCAC
2801 ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCATT CGCCATTCAG
2851 GCTGCGCAAC TGTTGGGAAG GGCGATCGGT GCGGGCCTCT TCGCTATTAC
2901 GCCAGCTGGC GAAAGGGGGA TGTGCTGCAA GGCGATTAAG TTGGGTAACG
2951 CCAGGGTTTT CCCAGTCACG ACGTTGTAAA ACGACGGCCA GTGAATTGTA
3001 ATACGACTCA CTATA

2. How to design recognition sequences

Step I: Identify the base positions where the insert will be ligated

In our example, it is position 60 (marked by an asterisk *):

> pGEM-T Easy Vector (Promega corporation)

Step II: add a T at the end of each flanking region (T marked in red)

Because the T is added at the 3' end, before the asterisk (which is the 3' end of the given DNA strand) we add a T and after the asterisk (which is 5' end of the given DNA strand) we add an A, corresponding with a T at the 3' end on the complementary strand.

> pGEM-T Easy Vector (Promega corporation)

1 GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG
51 GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT
101 GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC
151 ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

Step III:

Identify the F recognition sequence by selecting 15-20 bases in the 1st flanking region (just before the asterisk, marked below in lime color)

> pGEM-T Easy Vector (Promega corporation)

1 GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG

51 GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT

101 GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC

151 ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

The F recognition sequence is: GGCCGCGGGAATTCGATT

Step IV:

Select 15-20 bases in the 2nd flanking region (just after the asterisk) (marked below in green)

> pGEM-T Easy Vector (Promega corporation)

1 GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG

51 GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT

101 GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC

151 ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.

Step V:

Visually check that the F recognition sequence is different from the selected bases in the 2nd flanking region

The F recognition sequence is: GGCCGCGGGAATTCGATT
The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.

If different, move to step VI. If not different, repeat steps III and IV, but this time select longer sequences.

Step VI:

Obtain the R recognition sequence (marked in fuchsia) by making the reverse complement of the selected bases from the 2nd flanking region (step IV)

The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.
The R recognition sequence = The reverse complement of the selected bases in the 2nd flanking region = GCGAATT CACTAGTGATT

Step VII:

Visually check that the F and the R recognition sequences are not identical

The F recognition sequence is: GGCCGCGGGAATTCGATT
The R recognition sequence is: GCGAATTCACTAGTGATT
If different, you are done. If not different, repeat steps III to VI, but this time select longer sequences.

3. How to input and use the recognition sequence with DNA Baser Assembler

Step I: Define your vectors

Click the Tasks button to open the 'Tasks' panel

In the 'Tasks' panel, chose the desired task from ''Sequence processing' or 'Mutation detection' section

The PROJECT MANAGER window will open.

Click the 'Vector Removal' tab:

Design recognition sequences for the pGEM-T eazy vector

In the Vector Removal tab you will be able to enter your vector recognition sequence(s).

In the "Add new recognition sequence" box, enter the name and the nucleotides of the F recognition sequence file.

Press the ADD button to add the sequence into the 'Current recognition sequences' list:

Sequence Recognition & Vector Removal tab

In the "Vector cleaning" box choose if you want to remove or to keep the recognition sequence when is found. The vector will be removed in both cases, just the recognition sequence will be kept if so is chosen by the user).

Cut or keep recognition sequence

Repeat operation for the R recognition sequence. Make sure that both recognition sequences are active (the check box in front of them is checked):

pGem-T Easy vector in 'Current recognition sequences'

Press the APPLY button to save the settings.

Note: For details about vector removal and sequence recognition, click here to go to the Vector Removal page.

Step II: Sequence assembly

Click the PROJECT BUILDER tab, navigate to the folder containing your sequences and add them into the JOB LIST.

Now you are ready to start the sequence assembly (or mutation detection) by pressing the "Start sequence assembly" button.

details about vector removal and sequence recognition

Step III: Contig inspection

If the sequence assembly process was successful, the 'Assembly Window' will open. Here you can see if the recognition sequences were found and the vector was removed. The recognition sequences are marked in blue. The vector bases are strike. It will be automatically cut from your contig when you save the contig to disk.

Recognition sequences were found and the vector was removed

In the screenshot above, the recognition sequence was not removed, as per user choice (see Step I). If you want it to be removed, you need to select the 'Cut recognition sequence' option in the 'Vector Removal' tab, BEFORE assembling the sequences.

In this example, both recognition sequences were found:

Custom support

If you need custom support with tasks similar to this one (sequence assembly, vector removal, primer design, automation of sequence cleaning/processing jobs, etc) we can provide it at an affordable price.

Back to articles

Support Online Manual